Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrades: add checkpointing for raftAppliedIndexTermMigration #85074

Merged
merged 1 commit into from
Jul 27, 2022

Conversation

erikgrinaker
Copy link
Contributor

Forward-port of #84909, for posterity.


The raftAppliedIndexTermMigration upgrade migration could be
unreliable. It iterates over all ranges and runs a Migrate request
which must be applied on all replicas. However, if any ranges merge or
replicas are unavailable, the migration fails and starts over from the
beginning. In large clusters with many ranges, this meant that it might
never complete.

This patch makes the upgrade more robust, by retrying each Migrate
request 5 times, and checkpointing the progress after every fifth batch
(1000 ranges), allowing resumption on failure. At some point this should
be made part of the migration infrastructure.

NB: This fix was initially submitted for 22.1, and even though the
migration will be removed for 22.2, it is forward-ported for posterity.

Release note: None

The `raftAppliedIndexTermMigration` upgrade migration could be
unreliable. It iterates over all ranges and runs a `Migrate` request
which must be applied on all replicas. However, if any ranges merge or
replicas are unavailable, the migration fails and starts over from the
beginning. In large clusters with many ranges, this meant that it might
never complete.

This patch makes the upgrade more robust, by retrying each `Migrate`
request 5 times, and checkpointing the progress after every fifth batch
(1000 ranges), allowing resumption on failure. At some point this should
be made part of the migration infrastructure.

NB: This fix was initially submitted for 22.1, and even though the
migration will be removed for 22.2, it is forward-ported for posterity.

Release note: None
@erikgrinaker erikgrinaker requested review from irfansharif and a team July 26, 2022 17:43
@erikgrinaker erikgrinaker self-assigned this Jul 26, 2022
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@erikgrinaker
Copy link
Contributor Author

bors r=irfansharif

@craig
Copy link
Contributor

craig bot commented Jul 26, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jul 26, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jul 27, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jul 27, 2022

Build succeeded:

@craig craig bot merged commit 811ca1f into cockroachdb:master Jul 27, 2022
@erikgrinaker erikgrinaker deleted the forwardport-raft-migration branch July 27, 2022 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants